Development of a Speech Recognition System for Spanish Broadcast News

نویسندگان

  • Andreea Niculescu
  • Franciska de Jong
چکیده

One of the ASR applications is the generation of transcripts to facilitate searching through multi-media collections containing spoken data. Especially in the broadcast news domain ASR systems have been successfully deployed to index large collections of news. First of all because retrieval performed on ASR generated transcripts with an word-error rate (WER) under 50% gives resonable results [1] and second because ASR systems nowdays achieve high performances on broadcastnews data WER rate below 10% are no longer unusual [2][3]. In the MESH project[4]whose goal is to extract, compare and combine multimedia content (audio, video and text) from multiple news sources ASR modules for three different languages (Spanish, German and English) are going to be integrated to generate transcripts of broadcast news data. This report presents the setup and evaluation of a speech recognition system for Spanish broadcast news. Section 3 gives a short overview about the general basic components of a ASR system. Section 4 decribes the development and training process of acoustic and language models for the Spanish ASR. The performance evaluation results are dicussed in section 5. The report ends with conclusions and future work suggestions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spanish broadcast news transcription

We describe the Sail Labs Media Mining System (MMS) aimed at the transcription of Castilian Spanish broadcastnews. In contrast to previous systems, the focus of this system is on Spanish as spoken on the Iberian Peninsula as opposed to the Americas. We discuss the development of a Castilian Spanish broadcast-news corpus suitable for training the various system components of the MMS and report o...

متن کامل

Real-time live broadcast news subtitling system for Spanish

Subtitling of live broadcast news is a very important application to meet the needs of deaf and hard of hearing people. However, live subtitling is a high cost operation in terms of qualification human resources and thus, money if high precision is desired. Automatic Speech Recognition researchers can help to perform this task saving both time and money developing systems that delivers subtitle...

متن کامل

Statistical Machine Translation of Broadcast News from Spanish to Portuguese

In this paper we describe the work carried out to develop an automatic system for translation of broadcast news from Spanish to Portuguese. Two challenging topics of speech and language processing were involved: Automatic Speech Recognition (ASR) of the Spanish News and Statistical Machine Translation (SMT) of the results to the Portuguese language. ASR of broadcast news is based on the AUDIMUS...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

The L2F Broadcast News Speech Recognition System

Broadcast news play an important role in our lives providing access to news, information and entertainment. The existence of an automatic transcription is an important medium that not only can provide subtitles for inclusion of people with special needs or be an advantage on noisy and populated environments, but also because it enables data search and retrieve capabilities over the multimedia s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008